-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore zfs_arc_shrinker_limit in direct reclaim mode #16313
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I run with zfs_arc_shrinker_limit=0 for the same underlying reason, and I like this nuanced approach more.
Could you share why are you changing it only for direct reclaim? From my 6.6.20 kernel code reading direct and regular reclaim path use mostly the same code, so I'd expect they may have similar problems. What kernel are you using? Speaking about tests, just yesterday I've found more reliable reproduction of the ARC problems we've hit with 6.6 kernel with |
For this specific test VM, I am using kernel 6.1.0-22-amd64, without MGLRU. The main issue I would like to avoid (which is especially evident on MGLRU-enabled RHEL 9 kernels, as 5.14.0-427.22.1.el9_4.x86_64) is excessive swap file utilization when ARC should be shrunk, or OOM when no swap is available. On the other hand, setting This patch try to both avoid OOM due to non-shrinking ARC when no swap is available and ARC collapse due to small direct memory reclaims. I tried on a non-MGLRU kernel to keep firsts tests simple, but I plan to test on RHEL 9 also. My understanding of reclaim is that
This matches my observations. Thanks. |
I am testing another approach:
Any thoughts? EDIT: never mind, |
The 10000 default is definitely too low, since kernel does not always evicts that amount, but uses it as a base to which it applies memory pressure, starting from 1/1024th and increasing. It means that my the time kernel actually request full 10000 pages, it will already evict everything it can from page cache, that is not exactly the goal here. If I would try to think about the value, I would probably make it a fraction of arc_c, since no fixed value will be ever right (either too small for the most systems, or too big for small ones).
Reading your response above I am still not getting why do you suggest that we should ignore indirect memory pressure, but partially obey direct? The algorithm is pretty much the same there. Do you have actual data that it helps?
Good. And what does this patch do to it? |
Even worse, if it has no more pagecache to evict and no swap, it invokes OOM.
Do we have something similar in the form of
Maybe I am wrong, but this patch does not ignore indirect memory pressure any more than the current behavior. If anything, the patch as shown in #16313 (comment) would increase reactivity to indirect memory reclaim via As a side note, it restores correct behavior for
This patch does nothing to prevent excessive swap, but it greatly help to prevent OOM on machines without swap. That was my intent, at least - am I missing something? Thanks. |
I was thinking about
I didn't mean that the patch ignores indirect more, but that indirect |
The original patch version (the one committed) disables limit only if
It seems to me that in no ways I am ignoring limit for indirect reclaim mode initiated by Thanks. |
Well, I had a basic misunderstanding on what That said, the patch as-committed (with no limit for direct reclaim, when |
I resolved a conflict with the latest code. @amotin the way I understand the code, this patch effectively limit reclaim to Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing some data from our performance team with zfs_arc_shrinker_limit=0
I see a case when system has 2/3 of RAM free, but something (I guess it may be kswapd
) requests ARC eviction. It makes me understand your wish to restrict it specifically. But without real understanding what is going on there I am not exactly comfortable to propose a workaround, since it clearly can not be called a proper solution. I'll need to investigate it. One things specific to that system is that it is 2-node NUMA, which makes me wonder if kernel may try to balance it or do something similar.
Meanwhile, if we go your way, I would simplify the patch to the proposed below. And please rebase and collapse your commits. 5 line change does not worth 4 commits.
@shodanshok You should do rebase to master for your branch, not merge. |
zfs_arc_shrinker_limit (default: 10000) avoids ARC collapse due to excessive memory reclaim. However, when the kernel is in direct reclaim mode (ie: low on memory), limiting ARC reclaim increases OOM risk. This is especially true on system without (or with inadequate) swap. This patch ignores zfs_arc_shrinker_limit when the kernel is in direct reclaim mode, avoiding most OOM. It also restores "echo 3 > /proc/sys/vm/drop_caches" ability to correctly drop (almost) all ARC. Signed-off-by: Gionatan Danti <[email protected]>
@amotin you are right, sorry. I hope the last force-commit (after rebase) is what you asked. Thanks. |
You are right, it is not a proper solution. The real solution would be to never ignore kernel reclaim requests, but setting Still, ignoring only
NUMA can play its role, but I see similar behavior on single socket (and even single-core) machines. Regards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any updated results for how this change has been working in practice? It makes good sense to me, but I'd like to know a little more about how it's been tested.
@behlendorf I have done some testing (simulating memory pressure and reclaim) on both a Debian12 and a Rocky9 virtual machines, but nothing more. Anything specific would you like to test on these machines? Anyway, the whole idea is based on Thanks. |
@shodanshok you've got it right. Thanks for the info, I don't have any additional specific tests. I've merged this to get some additional miles on it. |
zfs_arc_shrinker_limit (default: 10000) avoids ARC collapse due to excessive memory reclaim. However, when the kernel is in direct reclaim mode (ie: low on memory), limiting ARC reclaim increases OOM risk. This is especially true on system without (or with inadequate) swap. This patch ignores zfs_arc_shrinker_limit when the kernel is in direct reclaim mode, avoiding most OOM. It also restores "echo 3 > /proc/sys/vm/drop_caches" ability to correctly drop (almost) all ARC. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Gionatan Danti <[email protected]> Closes openzfs#16313
As predicted, this does not fix all the issues: #10255 . |
Motivation and Context
zfs_arc_shrinker_limit
(default: 10000) avoids ARC collapse due to excessive memory reclaim. However, when the kernel is in direct reclaim mode (ie: low on memory), limiting ARC reclaim increases OOM risk. This is especially true on system without (or with inadequate) swap.Description
This patch ignores
zfs_arc_shrinker_limit
when the kernel is in direct reclaim mode, avoiding most OOM. It also restoresecho 3 > /proc/sys/vm/drop_caches
ability to correctly drop (almost) all ARC.How Has This Been Tested?
Minimal testing on a Debian 12 virtual machine.
Types of changes
Checklist:
Signed-off-by
.